Astronomy in the Cloud: Using MapReduce for Image Coaddition

نویسندگان

  • Keith Wiley
  • Andrew J. Connolly
  • Jeffrey P. Gardner
  • K. Simon Krughoff
  • Magdalena Balazinska
  • Bill Howe
  • YongChul Kwon
  • Yingyi Bu
چکیده

In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification and moving-object tracking. Since such studies benefit from the highest-quality data, methods such as image co-addition, i.e., astrometric registration followed by per-pixel summation, will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources such as potentially hazardous asteroids or transient objects such as supernovae, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this article we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data are partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud-computing resources: i.e., platforms where Hadoop is offered as a service. We report on our experience of implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multiterabyte imaging data set provides a good testbed for algorithm development, since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image co-addition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

A study on Ca II 854.2 nm emission in a sunspot umbra using a thin cloud model

In the present work, we introduce and explain a method of solution of the radiative transfer equation based on a thin cloud model. The efficiency of this method to retrieve dynamical chromospheric parameters from Stokes I profiles of Ca II 854.2 nm line showing spectral emission is investigated. The analyzed data were recorded with the Crisp Imaging Spectro-Polarimeter (CRISP) at Swedish 1-m So...

متن کامل

Job Attentive Scheduling Algorithm in Hadoop

In recent years cloud services have gained much attention as a result of their availability, scalability, and low cost. One use of these services has been for the execution of scientific workflows as part of Big Data Analytics, which are employed in a diverse range of fields including astronomy, physics, seismology, and bioinformatics. There has been much research on heuristic scheduling algori...

متن کامل

Poster: Cross Cloud MapReduce: an Uncheatable MapReduce

MapReduce [1] is becoming a popular data processing application on Cloud Environment. However, security issues make many customers reluctant to move their critical computation tasks to cloud. For instance, [2] points out a real security vulnerability that the cloud service leader Amazon EC2 suffers from: some members of EC2 can create and share Amazon Machine Image (AMI) to the EC2 community so...

متن کامل

Hadoop Mapreduce for Remote Sensing Image Analysis

Image processing algorithms related to remote sensing have been tested and utilized on the Hadoop MapReduce parallel platform by using an experimental 112core high-performance cloud computing system that is situated in the Environmental Studies Center at the University of Qatar. Although there has been considerable research utilizing the Hadoop platform for image processing rather than for its ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1010.1015  شماره 

صفحات  -

تاریخ انتشار 2010